Optimizing Performance of HPC Storage Systems
نویسندگان
چکیده
The performance of HPC storage systems depend upon a variety of factors. The results of using any of the standard benchmark suites for storage depends not only on the storage architecture, but also on the type of disk drives, type and design of the interconnect, and the type and number of clients. In addition, each of the benchmark suites have a number of different parameters and test methodologies that require careful analysis to determine the optimal settings for a successful benchmark run. To reliably benchmark a storage solution, every stage of the solution needs to be analyzed including block and file performance of the RAID, network and client throughput to the entire filesystem and meta data servers. For a filesystem to perform at peak performance, there needs to be a balance between the actual performance of the disk drives, the SAS chain supporting the RAID sets, the RAID code used (whether hardware RAID controllers or software MD-RAID), the interconnect and finally the clients. This paper describes these issues with respect to the Lustre filesystem. The dependence of benchmark results with respect to various parameters is shown. Using a single storage enclosure consisting of 8 RAID sets (8+2 drives each) it is possible achieve both read and write performances in excess of 6 GB/s which translates to more than 36 GB/s per rack of measured client based throughput. This paper will focus on using Linux performance tool, obdfiltersurvey, and IOR to measure different levels of the filesystem performance using Lustre. Keywords—HPC Storage, Benchmarking,
منابع مشابه
DynIMS: A Dynamic Memory Controller for In-memory Storage on HPC Systems
In order to boost the performance of dataintensive computing on HPC systems, in-memory computing frameworks, such as Apache Spark and Flink, use local DRAM for data storage. Optimizing the memory allocation to data storage is critical to delivering performance to traditional HPC compute jobs and throughput to data-intensive applications sharing the HPC resources. Current practices that statical...
متن کاملOptimizing Data Locality between the Swift Parallel Programming System and the FusionFS Distributed File System
Many of the high-performance computing (HPC) systems use a centralized storage system that is separate from the compute system. This approach is not going to be scalable as we seek to achieve exa-scale performance[6]. Distributed file systems can provide the scalability needed for exa-scale computing. FusionFS is a file system designed for HPC systems that achieves scalability in part by removi...
متن کاملCheckpoint/restart in practice: When 'simple is better'
Efficient use of high-performance computing (HPC)installations critically relies on effective methods for fault tol-erance. The most commonly used method is checkpoint/restart,where an application writes periodic checkpoints of its stateto stable storage that it can restart from in the case of afailure. Despite the prevalence of checkpoint/restart, it is still notvery we...
متن کاملGPU Erasure Coding for Campaign Storage
High-performance computing (HPC) demands high bandwidth and low latency in I/O performance leading to the development of storage systems and I/O software components that strive to provide greater and greater performance. However, capital and energy budgets along with increasing storage capacity requirements have motivated the search for lower cost, large storage systems for HPC. With Burst Buff...
متن کاملHigh-performance IO
Storage is becoming key in HPC systems, and especially when Exascale systems enter the game. The amount of data needed to solve the coming HPC challenges will not fit in memory, thus storage systems need to keep the pace of computing improvements; otherwise Exascale machines will waste energy waiting for the storage system to deliver the needed data. This research line investigates several path...
متن کامل